networkedlifeq21fandomcom-20200213-history
HOW DO GOOGLE, BING AND YAHOO SEARCH ENGINES WORK? WHO IS WINNING THE SEARCH WAR? by LakshmiChaitanya Madiraju, Sri Rohith Talluri
SHORT ANSWER Major parts of the world today have become totally reliant on the internet and this can be attributed to the advent of globalization. Many countries are voraciously making using of this technology for a variety of purposes. Searching for information on the web is one of the prominent uses of the internet and this actually lead to the development of search engines. Way back in the 80’s, libraries and books were the only sources available for gathering information. As days passed, various companies came up with unique ideas to build efficient search engines which would suffice the needs of the people while browsing the web. Thus the 90’s saw a steady rise in the invention of search engines and back then companies like Yahoo and MSN stood out in the search engine market; until google inconspicuously over shadowed all the existing search engine companies with its unique search algorithm in order to provide the users with a closer look at the web and hence quenching their thirst for acquiring information. The world has been constantly adapting itself to new technologies and various search engines underwent multiple design changes in order to provide the users with an easy access to information that they need; which ultimately enhances their search experience. When people think of search engines, the most popular engine that most users prefer is GOOGLE. Google the name itself has some kind of aura and it would be an arduous task to change the preference of people who use google for a good bunch of purposes like navigation, web search, movie streaming etc. Before google started its operations in the year 1998, the search engine that gained fame in the initial stages of its launch was Yahoo. In recent times BING is one search engine that has been gaining popularity and which considers itself as Google’s only rival in enhancing a user’s search experience on the world wide web. This question discusses the search algorithms of three renowned search engines like Google, Bing, Yahoo and also draws a comparison between them to highlight the salient features of these well-known search engines. It also evaluates the performance of these search engines to see who is winning the search war. LONG ANSWER Before going ahead in explaining the functioning of these search engines, a basic overview on how a search engine works can be seen in figure 1. Let us start our discussion with a basic definition of the word “SEARCH”; the meaning of search in this context can be defined as looking for information. Google search engine has redefined this term with its outstanding search algorithm. As a matter of fact, any search engine; be it Google, Bing or Yahoo having certain common parameters as part of their working model. Some of these parameters include spiders which fetch webpages, an index database which stores these pages and an algorithm which decides how to rank the fetched pages. The basic working model of a search engine is depicted in figure 1. A web search engine can be thought of as a spider’s web with millions and millions of web pages. When a user types in a query the search engine starts by crawling the search engine’s database with the help of a software called spiders; once all the relevant pages which have the information are fetched the search engine ranks the web pages based on relevancy by making use of its secret search algorithm. With this being done the final list of ranked pages is returned as results to the user. GOOGLE Google was introduced into the search engine market by two visionaries; Larry Page and Sergey Brin. It began its operations as a web information provider in the year 1998. Since its inception in the 90’s, Google’s search algorithm went through many changes in the aim of providing its users a remarkable and unforgettable search experience. Google makes use of three important processes in order to fetch, filter and rank webpages, thus making sure no stone is left unturned in augmenting the relevancy of a web search. The three vital procedures that are part of a google search are * Crawling: Crawling is nothing but fetching of web pages depending on a search. The founders of google developed a software called spiders which crawl web pages based on keywords in the search and once the web pages are fetched the spiders then crawl the links on those pages. This process continues until all related pages and the links on them are dumped into an index database. The google spiders while crawling first search for keywords in the URL of a page and then checks for them in the title of a page. Once the webpages are crawled based on keywords, the spiders then look for synonyms of those keywords in the fetched webpages. * Indexing: All the crawled documents in the index database are indexed using Google’s unique technique for indexing. The process of indexing pages is simple; all the parsed documents are assigned significant document ids and instead of placing the picked documents in word order, the words are placed in document order which is elucidated in example 1. As we can see from the example above, by parsing document 1 we can only come across DONALD but not the second word TRUMP similar is the case with documents 898 and 678. Whereas documents 2 and 999 possess these words “DONALD TRUMP” together. Thus, all such parsed documents with these two words together are picked considering them to be the pages with relevant information. This being done the relevant pages are then ranked to place the pages which are more pertinent to the query in top order. To achieve this google applies its patented “PAGE RANK ALGORITHM” which is discussed below. * Pageranking: The ranking of pages in google is based on an algorithm proposed by Larry Page and Sergey Brin, which is represented below PR(A) = (1-d) + d (PR(T1)/C(T1) + ...+ PR(Tn)/C(Tn)) 1 Where, PR(A) - Page Rank of page A PR(Ti) - Page Rank of pages Ti which link to page A C(Ti) - number of outbound links on page Ti d, θ - damping factor which can be set between 0 and 1, USUALLY 0.85 In order to compute the page ranks of the parsed pages; a H matrix which depicts the no. of outgoing links from a webpage is computed, depending on the graph of webpages as represented in figure 2. Using the information that the graph provides its corresponding H matrix can be computed, which is represented by the matrix give n below It is evident from the matrix that the second webpage or node has no outgoing links as the second row is all zeroes. This is called a dangling node and hence has to be corrected by replacing the zeroes in the row with 1/N; where N is the number of nodes in the graph. Since the above graph has 6 nodes the whole second row is replaced with 1/6. The H matrix with the new second row is represented as D. After forming the new matrix, the google matrix is computed with the help of the following equation Once the values for google matrix are computed, page ranks are then calculated using the equations depicted below. This computation has been repeated for 200 iterations using MATLAB; actually this should be repeated until the page rank values begin to recur. The values that were obtained for a set of iterations up to 200 are tabulated and shown in figure 3. Plot depicting the computed pageranks for all the six web pages is shown above and as it can be seen page 4 is ranked highest whereas the pages 5 and 6 are placed in second, third positions respectively. The total number of iterations for computing page ranks vary depending on the query that is being searched. For instance, if the search is on 6G networks, google would have to perform lesser iterations as the amount of information available on this query would be less. Whereas if the search is on 3G networks humongous amount of data would be available in Google’s database i.e. google would have to look up a large bunch of pages in order to pick the most relevant pages. Hence, Google will obviously perform more iterations in this case when compared to the query on 6G networks. Apart from the procedures explained above which play a vital role in providing better search results, there are 200 other factors which serve as an aid in returning appropriate result pages to users. Few of them are: # Anchor Text: It stores other important information on webpages such as non-textual data like pictures, GIF images etc. # Storing Proximity Information: Recently searched or returned data is stored, which also is a factor in deciding the speed with which pages are displayed. # Font weightage: Words with larger font and size are given more importance when compared to the words with a smaller font. Google offers other important features as well which help in improving the search, some of them are: * Spelling: Google offers auto spell check i.e. it automatically corrects a user when a wrong spelling is typed in. * Autocomplete: Even before a user types in a complete query google auto completes it there by implicitly reducing the time to return results * Google Instant: It is a feature which predicts a user’s query as the query is being typed * Escape Patch: It is nothing but the message a user gets when user types in a wrong query; “SEARCH INSTEAD FOR” this is the message google displays. * Query Understanding: Google specifically understands a query and returns query specific documents. This limits the number of documents that must be searched in the index database. * Synonyms: when a user types a query, google does not limit itself to searching keywords but instead it also searches for the synonyms of those keywords. ARCHITECTURE OF GOOGLE ]As seen in figure 4, the architecture of Google is simple. It consists of a URL server which sends all the URLs to the crawlers. The crawlers then crawl all the available URLs depending on keywords of the search, once the process of crawling is done; the fetched pages irrespective of relevancy are dumped into a store server. These pages are then compressed and saved in a repository. The indexer is then used to parse the pages. With the help of a sorter, the parsed pages are assigned doc ids and are sorted in various barrels accordingly. The doc ids of various pages are stored in the doc index file shown above. All the non-textual information and links to the webpages are diverted into an anchors file. The URL resolver depicted in the above architecture then parses all the links and information on those links along with the data placed in barrels to pick the pages with relevant information. These parsed pages are then sent into a links file where actual page ranking is done by implementing the page rank algorithm. Once the page ranks are computed, the corresponding values and ranked pages are sent to the page rank file. The parsed and ranked pages ultimately reach the user or a searcher. BING The only search engine which is striving real hard to compete with google is Bing. Bing search engine was introduced by Microsoft in the year 2009. The search strategy for Bing is based on two important concepts, which are Relevancy and Click distance. * Relevancy: In the aim of providing most relevant information to its users Bing uses the frequency with which keywords in a search appear in the parsed webpages. Firstly, hash ids are generated for every key word in the search and they are placed in a word frequency table as depicted below As it is evident from the table I, if the query is: “STEVEN SPEILBERG”, the word STEVEN occurs 15,12 and 7 times in documents 1,2 and 3 respectively. Whereas the word SPIELBERG occurs 17, 11 and 20 times in those documents. While the term “STEVEN SPIELBERG” together appears 18,9 and 4 times in the documents. It is clear that the term together appears the highest number of times in document 1, which is 18. Hence, this is considered the most relevant page by BING. * Click Distance: Along with relevancy Bing uses the concept of click distance to rank the parsed pages. Click distance is nothing but the number of clicks it takes to access a particular content on a webpage i.e. lesser the number of clicks it takes, faster the accessibility of content. In order to adjust the click distance, Bing makes of the URL depth property. URL Depth Property: According to this property, more the number of back slashes in a URL, the longer it takes to reach the page which has the content. Hence, longer URLs are given less important while ranking webpages. An example depicting this property is shown below. QUERY: STEVEN SPIELBERG http://www.imdb.com/name/nm0000229 No. of backslashes = 2 http://www.dreamworksstudios.com/about/executives/steven-spielberg No. of backslashes = 3 also the URL is “longer” If the query is STEVEN SPIELBERG, BING ranks the first URL or the webpage with 2 back slashes in the first position as it has lesser number of backslashes when compared to the second URL as it is longer with 3 back slashes. YAHOO It was developed as a prototype search engine model by David and Jerry in the year 1995. They initially named it as “JERRY’S AND DAVID’S GUIDE TO THE WORLD WIDE WEB” which was later changed to “YAHOO” and the acronym for it is “YET ANOTHER HIERARCHICALLY OFFICIOUS ORACLE”. Yahoo in the initial stages was just a directory of websites placed in a hierarchical order. It then began its operations and acquired 58 companies by 2009. 47 of these companies are from united states while the rest are from other foreign countries. It started by acquiring the services of a company called Inkotomi in 2002. It went ahead and bought overture services.inc and google in 2003. Although it acquired overture it mainly made use of Google’s search results. In the same year yahoo terminated the service that google was providing and made use of the search technologies it bought by then and in 2004 yahoo developed its own web crawler called “Yahoo slurp”. In order to return adequate search results; yahoo updated its search and introduced “search assist”. “Build Your Own Search Engine” which was introduced by yahoo in 2008 gave aspiring developers a chance to build a new search engine by making use of Yahoo’s platform. BING collaborated with Yahoo in 2010 to provide search results for yahoo users; there by powering yahoo search. This collaboration promised that the ads on Bing would be sold on Yahoo as well. In the month of April in 2015, yahoo made an amendment to its existing agreement with Bing according to which; Bing would only power its desktop searches whereas for mobile devices yahoo will make use of its own search algorithm. Yahoo is also in alliance with google until 2018 for providing part of its search results. But until now yahoo has not disclosed the search algorithm that it makes use of for returning web pages. If observed clearly, yahoo returns results that are similar to Bing. Hence, Yahoo is nothing but Bing with a different logo. SEARCH WAR Over the years, Google has gained immense popularity in most parts of the world. It is now considered as the most optimized search engine but it still keeps updating certain parameters that would improve the performance of their search engine to the greatest extent possible. With this being said Bing which was only introduced in 2009 and yahoo which has been in the market for a long time are competing with Google to go all the way up to the top to reach the top notch standards set by google. Thus, currently there is a search war which has been spurred due their rivalry with google and ironically this might spurn another industrial revolution in the search engine industry. To evaluate the performance of a search engine three parameters are considered: # Technology: Evaluating their performance based on their search technology. # Frequency: Depending on the frequency with which users visit a search engine. # Demographic Information: This is nothing but the number of male and female users who use a particular search engine. Search Engine preference in the united states as on 2015: * 80% of total users preferred Google * 8% of users preferred Bing * 6% preferred Yahoo * Other 6 % preferred the rest search engines like AltaVista etc. The reason why people prefer GOOGLE: * User friendly * More relevant information * Advertisements relevant to search * Faster in returning search results * Blocking spam The reason why people prefer Yahoo/Bing: * Social media friendly * Result pages are attractive * Advertisements have less competition * Bing gives reward points for searching ' ' Category:Contents